CS 224N Final Project: Unsupervised Clustering of People, Places, and Organizations in Wikileaks Cables with NLP Cues

نویسندگان

  • Xuwen Cao
  • Beyang Liu
چکیده

Our goal is to extract the names of key entities from written U.S. diplomatic communications and then to apply natural-languageand sentiment-based clustering to these entities using contextual features extracted from the data. We extract entities using an off-the-shelf statistical NLP package, and then seek to generate meaningful clusters of entities in an unsupervised fashion. To do so, we experiment with different models, feature sets, and clustering algorithms. For the purposes of this project, we define key entities to be people, nations, and organizations that occur at least k times in the dataset. We determined k through empirical evaluation of the entities extracted from the data, and also the need of different scenarios.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CS 224N Final Project: Adaptive Language Models

To learn a language, one should be exposed to the right environment. A person who used English for his/her entire life would never understand a single Japanese word. Even within the same language, people use the same language very di erently from person to person and from occassion to occasion. For instance, you would nd English written on a NLP research paper (i.e. this one) very di erent from...

متن کامل

CS 224N Final Project: Speech Summarization for Rapid Playback

Searching audio streams is currently a tedious task. There exist no tools that allow a user to quickly scan through a long video lecture or audio book using only audio cues. In this paper I propose and build a simple system that alleviates the burden of searching through a large amount of audio data. This system use concepts from other text and speech summarization work, but focuses on the prob...

متن کامل

CS 224N Default Final Project: Question Answering

This assignment can be completed in groups of up to 3 people. We encourage groups to work together productively so that all students understand the submitted system well. We ask that you abide by the university Honor Code and that of the Computer Science department, and make sure that all of your submitted work (except as acknowledged) is done by yourself and your team members only. Please revi...

متن کامل

CS 224N Default Final Project: Question Answering

This assignment can be completed in groups of up to 3 people. We encourage groups to work together productively so that all students understand the submitted system well. We ask that you abide by the university Honor Code and that of the Computer Science department, and make sure that all of your submitted work (except as acknowledged) is done by yourself and your team members only. Please revi...

متن کامل

CS 224N Default Final Project: Question Answering

This assignment can be completed in groups of up to 3 people. We encourage groups to work together productively so that all students understand the submitted system well. We ask that you abide by the university Honor Code and that of the Computer Science department, and make sure that all of your submitted work (except as acknowledged) is done by yourself and your team members only. Please revi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011